KAPPA: AI Data Preparation and Metadata Enrichment for Unstructured Data

Enterprise AI has entered a new phase in 2026, where plans are executing into production and with mixed results. Raw data volume is not the issue once discussed. Accurate, precise data curation is what IT leaders are now facing and it’s no easy feat across petabytes of highly distributed unstructured data.

Making this file and object data safe, usable and searchable for AI means that IT must enrich metadata. Better metadata is not only a pathway to more accurate, relevant AI outcomes but also lower costs, since you can send much less data to your AI processors.

Komprise Cofounder and President Krishna Subramanian discusses the latest product update from Komprise.

What is Komprise announcing?

Krishna Subramanian: Komprise AI Preparation & Process Automation (KAPPA) data services, which is a first-of-its-kind serverless compute offering for unstructured data. It is included in the Komprise Intelligent Data Management platform and is currently available for early access. AI needs high-quality unstructured data which requires rich metadata extraction. The challenge enterprises face is that the metadata for unstructured data is often contextual and specific to their enterprise, their industry and their security constraints. With KAPPA, IT can rapidly deliver custom data services, such as industry-specific metadata enrichment, without having to provision or manage the infrastructure to process the operation across large datasets. Read the press release.

kappascreenshot

Komprise AI Preparation & Process Automation (KAPPA)

The term serverless usually applies to physical infrastructure. Can you explain the serverless capabilities that Komprise delivers?

KS: Hyperscalers have used the term “serverless” in compute to mean they provision and scale the infrastructure while the developer can focus solely on their code. Similarly, when applying data processing to unstructured data, you typically have complex “infrastructure” management that Komprise automatically handles including:

Providing a global view of all the unstructured data across silos so you can pick and choose the right data to operate on with Komprise Analysis and the Komprise Global Metadatabase;
Provisioning the compute infrastructure and scaling it with parallelism to run the operation via Komprise Observers;
Spinning up any cloud infrastructure as needed prior to the executing the data processing;
Iterating the data processing function across a large dataset which can easily be petabytes of data, billions of files;
Scaling elastically to handle the load as needed;
Spinning down and deprovisioning any cloud resources as needed once the operation is complete;
Retaining any tagging from the data processing operation so it persists.

Why is there a need for KAPPA data services?

KS: As we know, unstructured data is diverse, large, and needed for AI. Yet because of its siloed nature across hybrid cloud environments and because it lacks context and is not easily searchable, unstructured data is difficult to leverage. Additionally, data quality issues are slowing down the pace of AI projects and contributing to hallucinations and a lack of trust in the outcomes. Gartner predicts that through 2026, organizations will abandon 60% of AI projects unsupported by AI-ready data.

Metadata enrichment through data tagging is a foundational capability in solving quality issues because it supplies structure and identifying traits to the data. That way, data stakeholders can search and curate precisely the required data for their projects. Often, enterprise IT organizations have unique metadata and data preparation requirements that do not track across industries or even companies, requiring custom processing.

How do IT teams and data scientists enrich metadata today?

KS: Custom metadata extraction has typically been handled with ETL and other traditional approaches of data processing using pre-built connectors and plug-ins. These methods are often time-consuming, inflexible, and costly to maintain.  Creating just one custom data operation for an AI data workflow could take months. These timelines are untenable today when there’s growing urgency around AI and as requirements and projects are evolving quickly.

How do KAPPA data services work?

KS: Users (IT and data experts) simply insert a few lines of Python code for the requested actions per file into a field in Komprise. The solution then performs the steps to execute the operation across a specified dataset. We call these discrete tasks “KAPPA functions,” and examples include:

Read custom metadata headers from medical DICOM or genome BAM files for tagging
Electronic Lab Notebook metadata extraction
Microsoft Purview tag synchronization
Media Image info extraction (EXIF, XMP, IPTC etc)
PDF, ERP, CRM metadata and project information extraction

Why is this important for customers?

KS: IT professionals are being asked to evaluate and integrate new AI tools and AI-ready infrastructure, create optimal security and governance stacks for modern workforces and move data to the right locations for analysis. Metadata management is crucial to the big picture but can be painstaking.

IT teams can now enrich metadata across petabytes with just a few lines of code. This is critical for leveraging unstructured data in enterprise AI projects which often have specific regulatory, industry and departmental requirements.  KAPPA data services make the process of defining such custom actions remarkably simpler for IT.

The result is AI data workflows that are more accurate and faster to execute. These workflows can also help improve data security and data governance; for instance, a KAPPA function can apply sensitivity labels to Microsoft Purview.

Why is this important for partners?

KS: Resellers and System Integrators provide customer value by understanding the customer’s unique requirements and tailoring solutions to their needs. Consulting practices at resellers can now leverage the KAPPA data services platform to provide a library of custom data services tailored to their enterprise customers’ needs.

For example, a reseller can create a KAPPA custom function to clean up interim genomics BAM files associated with a specific genome sequence. Or, they can tag all media files with the associated ERP invoice status for a Media & Entertainment company. This enables IT to set up workflows to tier and archive completed media projects once they are invoiced. These customization takes under an hour, instead of months to write a custom plug-in.

Learn more at Komprise.com/KAPPA

Read the KAPPA data services solution brief

KAPPA coverage:

KAPPA: A Serverless Approach to Metadata Enrichment and Unstructured Data Management

What is Komprise announcing?

The term serverless usually applies to physical infrastructure. Can you explain the serverless capabilities that Komprise delivers?

Why is there a need for KAPPA data services?